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© Federated information management architecture and system. 



© A federated architecture and system are exten- 
sible and flexible for integrated access to heteroge- 
neous DataBase Management Systems (DBMS) dis- 
persed over a long haul network, allowing transpar- 
ent access to a wide variety of DBMS while main- 
taining the local autonomy of the underlying DBMS. 
In addition, the system can run on top of different 



hardware, operating systems, network communica- 
tions, and DBMS. The system can include new tar- 
get DBMS with minimum changes and is not limited 
to integrate relational DBMS, but also to integrate 
legacy DBMS such as hierarchial or network DBMS, 
spatial information or text retrieval systems. 



□ 



12 



1 1 <»**c 





FIG. 1. 



REQUEST 



EDITOB 



RESULT 



CO 

m 
rv 

in 
cm 

CP 



LU 



I 



QUERY 



RESULTS 



DISTRIBUTED INFORMATION MANAGER 



1" query n ^yc^pob^m^ ^rrmimjj 

PLAN 



^ ^ 4 DISTRIBUTED PROCESSING COORDtNATOR] 
14b pi am . ■ -^RESULT 



LOCAL INFORMATION 
MANAGER 



X 



REQUEST 



RESULT 



22 



DB2 
INGRES 
SYBASE 
O RACL E 

El— 



E 



OCAL INFORMATION 
MANAGER 



bdhi_ 



X 



LOCAL INFORMATION 
MANAGER 



ITASCA 



fLU~ 




H ISTS I 



KNOWLEDGE-BASED SCHEMA 
INTEGRATION TOOL S 
I DATA DISCOVERY | 



| CONCEPT MATCHER ] 



REQUEST \ RESULT "^6 

KNOWLEDGE-BASED SCHEMA 
INTEGRATION TOOLS 



SEMANTIC 
QUERY 
PROCESSOR 



RESULTS 
FORMATTER 



| INFERENCE ENGINE 



REQUEST I RESULT T^T 



.OCAL INFORMATION 
MANAGER 



ITASCA m E3 



Plank Xerox (UK) Business Services 

(3.10/3.09/3.3.4) 



1 



EP 0 625 756 A1 



2 



BACKGROUND OF THE INVENTION 

1 . Field of the Invention 

This invention relates in general to an architec- 
ture and method useful in computer data networks, 
and, more particularly, to a federated (global) ar- 
chitecture and system which are extensible and 
flexible for providing users with transparent inte- 
grated access to heterogeneous DataBase Man- 
agement Systems (DBMS) dispersed over a long 
haul network. 

2. Description of the Related Art 

During the past decade, large scale organiza- 
tions and environments have initially adopted het- 
erogeneous and incompatible information systems 
in an uncoordinated way; independent of each oth- 
er and without consideration that one day they may 
need to be integrated. As a result, information 
systems have become more and more complex, 
and are characterized by several types of het- 
erogeneity. For example, different DataBase Man- 
agement Systems (DBMS) models may be used to 
represent data, such as the hierarchical, network, 
and relational models. Aside from databases, many 
software systems (such as spreadsheets, multi- 
media databases and knowledge bases) store other 
types of data, each with its own data model. Fur- 
thermore, the same data may be seen by various 
users at different levels of abstraction. Because of 
such differences, users find it difficult to under- 
stand the meaning of all the types of data pre- 
sented to them. Analysts, operators and current 
data processing technology are not able to or- 
ganize, process and intelligently analyze these di- 
verse and massive quantities of information. Their 
inefficiency often results in late reports to decision 
makers, missed intelligence opportunities and un- 
exploited data. 

One of the needs is to access and manage 
existing and new earth science data. The data is 
been collected and stored within a number of dif- 
ferent DBMSs and image files for the purpose of 
monitoring global earth processes. The earth sci- 
ence data are collected by different information 
systems including data concerning: climate, land, 
ocean, etc., which are composed of relational 
databases, images, and files. These systems were 
designed independently and operate in completely 
different ways as to how the data are stored and 
accessed. Moreover, they are tailored to different 
hardware platforms. So, in order to access the 
data, the users must learn how to access different 
systems. This increases training costs and reduces 
user productivity. In addition, the majority of users 
do not have the level of computer science exper- 



tise necessary to learn the different individual sys- 
tems within a short period of time, thus discourag- 
ing them from accessing dispersed data, or in 
some instances from even knowing what data are 

5 available for their use. 

The same problems occur in the Computer 
Integrated Manufacturing (CIM) environment. CIM 
is a very complex network of physical activities, 
decision making and information flow. Most manu- 

io facturing facilities contain independently designed 
and dispersed information bases. In such an envi- 
ronment, improvement in manufacturing productiv- 
ity can be obtained by providing timely access to 
all essential data, local or distributed. Present CIM 

75 systems lack a federated, i.e., global, database 
which contains information required for all phases 
of manufacturing, that is, design, process, assem- 
bly and inspection. Usually, manufacturing pro- 
cesses are treated independently from the other 

20 phases. This is undesirable in the sense that data 
or knowledge from one process is not available for 
use by another. There is need to integrate data so 
that they can be made globally available to the 
users and processes of a CIM system. i 

25 In conclusion, there is an urgent need to in-' 

tegrate these dispersed data to provide uniform 
access to the data, to maintain integrity of the data, 
and to control its access and use. Rather than 
requiring users to learn a variety of interfaces in 

30 order to access different databases, it is preferable 
that a single interface be made available which 
provides access to each of the DBMSs and sup- 
ports queries which reference data managed by^ 
more than one information system. 

35 Past and current research and development in 

distributed databases allows integrated access by 
providing a homogenizing layer on top of the un- 
derlying information systems (UISs). Common ap- 
proaches for supporting this layer focus on defining 

40 a single uniform database language and data 
model that can accommodate all features of the 
UISs. The two main approaches are known as view 
integration and multi-database language. 

The view integration approach advocates the 

45 use of a relational, an Object-Oriented (OO), or a 
logic model both for defining views (virtual or snap- 
shot) on the schemas of more than one target 
database and for formulating queries against the 
views. The view integration approach is one 

50 mechanism for homogenizing the schema incom- 
patibilities of the UISs. In this framework, all UISs 
are converted to the equivalent schemas in the 
standard relational, OO, or logical data model. The 
choice of the uniform data model is based on its 

55 expressiveness, its representation power and its 
supported environment. This technique is very 
powerful from the user's point of view. It insulates 
the user from the design and changes of the un- 
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derlying Information Management System (IMS). 
Thus, it allows the user to spend more time in an 
application environment. However, the view integra- 
tion approach has a limited applicability (low de- 
gree of heterogeneity) because there are many 5 
situations when the semantics of the data are 
deeply dependent on the way in which the applica- 
tions manipulate it, and are only partially expressed 
by the schema. Many recent applications in areas 
where traditional DBMSs are not usable fall into io 
this situation (multi-media applications involving 
Text, Graphics and Images are typical examples). 
In addition, there ore no available tools to semi- 
automate the building and the maintenance of the 
unified view which is vital to the success of this 75 
technique. 

In the multi-database language approach, a 
user, or application, must understand the contents 
of each UIS in order to access the shared informa- 
tion and to resolve conflicts of facts in a manner 20 
particular to each application. There is no global 
schema to provide advice about the meta-data. 
Ease of maintenance and ability to deal with in- 
consistent databases make this approach very at- 
tractive. The major drawback of this approach is 25 
that the burden of understanding the underlying 
IMSs lies on the user. Accordingly, there is a 
tradeoff between this multi-database language ap- 
proach and the view integration approach dis- 
cussed above. This invention will address the defi- 30 
ciencies suffered from the above two approaches. 

OBJECTS AND SUMMARY OF THE INVENTION 



Therefore, it is an object of the present inven-~\ 
tion to provide a federated (global) architecture and 
system which are extensible and flexible for provid- 
ing users with transparent integrated access to 
heterogeneous DBMS dispersed over a long haul 
network. 

It is still another object of the present invention 
to provide an architecture and system for use in 
multiple large information management systems 
that are geographically dispersed, such as Com- 
mand and Control, Computer Integrated Manufac- 
turing, Medical Information Management, and many 
applications in intelligent analysis and decision 
support domains that will enable more effective and 
transparent access to existing high data volume 
sources that are collected and stored with different 
geographically dispersed DBMSs. 

It is still another object of the present invention 
to provide a federated information management 
architecture and system where the users have only 
to learn one single interface and one unified view 
of the data. 

The invention residing in the Federated In- 
formation Management (FIM) architecture de- 
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scribed and claimed herein, allows the end-user to 
access geographically dispersed multiple informa- 
tion management systems. It provides the end-user 
with a unified view of the underlying information 
management systems. Data distribution and loca- 
tion transparencies are supported by the FIM ar- 
chitecture of the present invention. This means that 
the end-user does not need to know how the data 
is distributed and its location in order to share and 
access relevant data. In addition, the FIM architec- 
ture of the present invention can integrate both 
existing and new information management sys- 
tems. 

Among the advantages of the invention are the 
following: 

1) The invention allows distributed access with- 
out a change to the underlying existing 
databases; 

2) The invention allows a decrease in training 
cost and time for learning different DBMSs lead- 
ing to an improvement in user productivity; 

3) The invention is able to utilize, share and 
combine data that is otherwise dispersed in 
many different physical and logical locations; 

4) The invention allows the overall system to 
evolve and include new information manage- 
ment systems with minimum change; and, 

5) The invention is able to adapt and interface 
with normally incompatible different database 
vendors. 

One novel aspect of the invention therefore is 
the federated architecture coupled with the Inter- 
Site Transaction Service (ISTS) architecture to al- 
low transparent access to a wide variety of DBMSs 
while maintaining the local autonomy of the under- 
lying DBMSs. With this invention architecture, the 
FIM of the present invention can run on top of 
different hardware, operating systems, communica- 
tion networks, and DBMSs. In addition, the system 
of the present invention can evolve to include new 
target DBMSs with minimum changes. The fed- 
erated architecture of the present invention is not 
limited to integrate relational DBMSs, but may also 
integrate legacy DBMSs such as hierarchical or 
network DBMSs, spatial information systems or 
geographical Information Systems, and text retriev- 
al systems. 

The present invention, therefore, provides an 
Intelligent Integration of Information environment to 
support seamless access to large scale heteroge- 
neous information management systems which in- 
cludes relational, spatial, and text systems. The 
invention includes the following features to support 
this environment: 

(1 ) A federated architecture that supports trans- 
parent access to multiple database systems, ft 
provides the end-user with a unified view of the 
underlying database systems. Local autonomy 
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of the underlying database systems are fully 
maintained in the federated architecture. This 
means that the users can still use the same 
application to access the local databases, and 
only minimum change to the local database 5 
system is required for sharing and remote ac- 
cessing relevant data. The architecture includes 
several distributed query optimization methods 
for fragmented and replicated data. This op- 
timization ability improves the total query cost ' 10 
by reducing the transmission and the processing 
costs of the overall system. Also, the architec- 
ture uses fragmented dependencies information, 
called semantic query optimization, to improve 
the total cost. A high layer of distributed transac- 75 
tion services is also provided to separate the 
lower layer network communication protocols 
from the distributed query processing protocols. 
Detail design of this architecture and the Fed- 
erated Information Management (FIM) are dis- 20 
cussed below. 

(2) The extension of the normally passive role of 
the conventional data dictionary (DD) repository 
into active and intelligent roles. In the active 
role, the Smart Data Dictionary (SDD) of the 25 
present invention automatically maintains data 
consistency as new applications or databases 

are added. To support reasoning and problem 
solving capabilities in a cohesive way, the SDD 
uses a multi-dimensional reference model that 30 
allows multiple integrated layers of abstractions 
spanning a wide variety of data types (text, 
spatial, etc.) The following concepts are prefer- 
ably incorporated into the SDD's design to 
achieve greater efficiency and flexibility for ac- 35 
quiring, storing and manipulating such meta- 
data: 

(a) SDD's modular architecture and multi-di- 
mensional information structure for multi- 
media data and knowledge abstractions. 40 

(3) The flexible SDD architecture allows the sys- 
tem to incorporate new information systems in- 
crementally. But, in a large scale distributed 
information environment, the SDD becomes a 
bottle neck, and the knowledge represented be- 45 
comes a large, complex and unmanageable 
hierarchy. The present invention minimizes 
these risks by developing a cooperative environ- 
ment for multiple independent SDDs. In this 
cooperative model for integration and mainten- 50 
ance, we assume that each SDD can run 
autonomously and still be able to interact with 
other SDDs for appropriate knowledge without 
completely integrating all the foreign SDD's 
meta-data. Each SDD contains the meta-data 55 
about a specific domain which might include 
many databases; in addition, it will contain 
knowledge about its nearest neighbors, but not 



all neighbors. Despite many challenges such as 
meta-data consistency, multiple data/knowledge 
views, communication and data/knowledge 
transformation, the cooperation of multiple SDDs 
will be an ineluctable characteristic of large 
scale integrated information systems. 
The above technologies and guiding concepts 
provide for a novel fusion of existing technologies 
(distributed systems, knowledge-based systems, 
object-oriented systems, heterogeneous databases 
and machine learning). Moreover, they form a basis 
for an intelligent integration of information design 
that represents a significant technical advancement 
over existing technologies. The novel features of 
construction and operation of the invention will be 
more clearly apparent during the course of the 
following description, reference being had to the 
accompanying drawings wherein has been illus- 
trated a preferred form of the device of the inven- 
tion and wherein like characters of reference des- 
ignate like parts throughout the drawings. 

BRIEF DESCRIPTION OF THE FIGURES 

Figure 1 is a diagram illustrating one mode of a 
data communication network between individual 
users or applications and underlying information 
systems in accordance with the present inven- 
tion; 

Figure 2 is a diagram showing the component 
architecture and operational processing flow of a 
distributed information manager in accordance 
with the present invention; 

Figure 3 is a diagram showing the component 
architecture and operational processing flow be- 
tween a distributed information manager and an 
associated local information manager both in 
accordance with the present invention; 
Figure 4 is a diagram showing the component 
architecture and operational processing flow of a 
limited information manager in accordance with 
the present invention; 

Figure 5 is a diagram showing a Bulk-Load 
Copy Protocol (BCP) in accordance with the 
present invention; 

Figure 6 is a diagram showing the component 
architecture and operational processing flow be- 
tween a Smart Data Dictionary Cache Memory 
Management and a Smart Data Dictionary Serv- 
er in accordance with the present invention; 
Figure 7 is a diagram showing the component 
architecture and operational processing flow of a 
Cache Memory Management device in accor- 
dance with the present invention; 
Figure 8 is a diagram showing a Client/Server 
model in accordance with the present invention; 
Figure 9 is a diagram showing a Hierarchial Star 
Distributed Processing Topology in accordance 
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with the present invention; 
Figure 10 is a diagram showing a Hierarchial 
Distributed Processing Topology in accordance 
with the present invention; and, 
Figure 11 is a diagram showing a Distributed 
Transaction service Hierarchy Topology in ac- 
cordance with the present invention. 

DESCRIPTION OF THE PREFERRED EMBODI- 
MENT 

As indicated above, the federated approach of 
the present invention supports complete transpar- 
encies for data distribution and heterogeneities 
among information systems. It also maintains local 
site autonomy by minimizing the requirement for 
changes to all participant sites. This flexible ar- 
chitecture allows the system to evolve in the future 
and insulates the users from changes to the under- 
lying hardware, IMS, database logical and physical 
designs. Once users have learned a single user 
interface and access methodology, they can ac- 
cess all databases without regard to their location 
and design. This increases productivity and saves 
training costs. 

The present invention as embodied in a Fed- 
erated Information Management System (FIMS) de- 
signed to access data from multiple Relational 
DataBase Management Systems (RDBMS), will 
now be described in a preferred form. 

FIMS presents the user with the illusion of a 
single, integrated, non-distributed database through 
a uniform interface, a structured query language 
(SQL) or a graphical user interface (GUI). FIG. 1 
describes the FIMS general architecture. It is com- 
posed of the following major components. 

The Query Browser and Editor (QuBE) module 
10 provides the users 12 with a uniform interface to 
access multiple databases. Users can formulate 
their requests using either a SQL or a GUI. 

The Distributed Information Manager (DIM) 14 
decomposes the global query into multiple queries. 
It provides a distributed access plan (DAP) which is 
optimized based on the sites' processing and the 
network communications costs. This access plan is 
composed of several local execution plans (LEP), 
one for each site. 

Specifically, the user constructs a global query 
using QuBE. As far as the user is concerned, he or 
she only accesses a single (virtual) database. 
QuBE sends the global query to DIM. Based on the 
meta-data stored in the Smart Data Dictionary 
(SDD) 16, described below, DIM 14 decomposes 
the global queries into multiple local queries. It 
optimizes the total cost for executing the global 
query by minimizing the amount of data needed to 
be transferred among local sites, and by choosing 
the appropriate site for processing the local access 



plan. It will then coordinate the execution of mul- 
tiple local queries. 

The DIM 14 is composed to two high level 
components: the Query Decomposer and Optimizer 
5 14A, and the Distributed Processing Coordinator 
14B. 

FIG. 2 shows the DIM lower level sub-compo- 
nents and the information flow among them. The 
Query Decomposer and Optimizer (QDO) 14A de- 

10 composes the query into local sites' queries. 
Based on the site processing cost and transmission 
costs, the QDO builds an execution plan to mini- 
mize the query overall execution cost. 

The Syntactic and Semantic Parser (SSP) 18 

75 parses and validates the syntax of a query. It 
identifies and verifies the join connectivity. It inter- 
faces with the Smart Data Dictionary (SDD) 16 to 
retrieve information about the Federated semantic 
Schema (FSS) and Export Semantic Schema (ESS) 

20 from the underlying databases. The FSS and ESS 
represent the local schemata information, the uni- 
fied view, and the inter-relationships among entities 
(views, objects) from multiple local schemata. 

The Optimizer 14A provides careful planning to 

25 control the time that is required to process the 
query for access to data in multiple databases. 
This is important in local DBMSs, but it becomes 
even more important when data must be moved 
across a network. The Optimizer 14A retrieves the 

30 data distribution information (i.e., fragmented, repli- 
cated, or mixed fragmented and replicated), the 
transmission cost, and the processing cost from 
the SDD 16. It then uses an Integrated Replicated 
Semi-join algorithm to plan for the overall execu- 

35 tion. The general execution plan is composed of 
the following steps: 

Local Reduction. Based upon the predicates 
specified in the global query, this step reduces the 
amount of data as much as possible before send- 

40 ing the data to other sites. 

Fragment Replication. The Optimizer decides 
which fragments need to be replicated in order to 
minimize the overall query execution cost. 

Local Query Execution. The portion of the local 

45 query that is involved with the replicate fragment is 
executed, and the intermediate result is sent to the 
home site. The home site is defaulted to the user's 
site, unless specified by the user. 

Result Integration. The home site integrates the 

so intermediate results sent by the local sites. 

The Execution Plan Generator (EPG) 20 trans- 
lates the execution plan for each site from an 
internal data structure to the Distributed Intermedi- 
ate Structured Query Language (DISQL), an exten- 

55 sion of SQL. For each local site, the local execution 
plan is composed of three files containing SQL 
statements as well as schema and other data. 



5 



9 



EP 0 625 756 A1 



10 



The local execution plan involves interfaces 
with the target DBMS. There are two major modes 
of coordination between DIM 14 and LIMs 22 for 
executing the local execution plan, namely autono- 
mous and full-coordination mode. 

In the autonomous mode, DIM 14 simply sends 
each LIM 22 its local execution plan which can be 
executed independently and in parallel. There is no 
coordination between DIM and LIMs in autonomous 
mode. Each LIM coordinates its own local execu- 
tion plan which includes interfaces with local DBMS 
and interaction with other LIMS. 

In the full-coordination mode, the DIM coordi- 
nates all the local execution plan by serializing 
each local execution step among LIMs. This means 
that no coordination is needed to be done at each 
LIM, because it only interacts with the DIM and not 
with other LIMs. Parallel processing of each LIM is 
inhibited in this case, and the DIM becomes a 
bottleneck. 

Intuitively, the autonomous mode provides bet- 
ter performance through parallelisms. But deadlock 
problems might occur during communication 
among LIMs for fragment replication needed to be 
addressed. Deadlock problems can be resolved 
using either an operating system or a cli- 
ent/server's interrupt. In a long-haul network and 
large scale heterogeneous environment, using in- 
terrupts independently of the underlying operating 
systems is not currently available. For example, the 
Sybase Open Server does not support interrupts, it 
only provides the user with UNIX interrupts which 
are not usable for running LIMs in either operating 
system. 

The present invention then uses an approach 
that employs a Distributed Processing Coordinator 
(DPC) 24 component in the DIM 14 to provide a 
semi-coordination mode which bases on the logic 
of each local execution step. This means that the 
autonomous mode is used only when no deadlock 
situation is expected, otherwise, the full-coordina- 
tion mode is employed. 

The basic execution steps which are coordi- 
nated by DPC 24 in semi-coordination mode are 
described as follows in FiGs. 3 and 4: 

The first step is to send each LIM 22 its local 
execution plan. The local execution plan is in the 
form of three files which are sent to each LIM 22 
using the remote file transfer utility 26 provided by 
the Inter-Site Transaction Service (ISTS) described 
below. The first file 28A is a data file containing 
fragment and distribution information needed to 
execute the local query. The second file 28B con- 
tains SQL statements to reduce the local data, 
thereby simplifying the local query and minimizing 
the amount of data transmitted between LIMs. The 
third file 28C contains the local query to be ex- 
ecuted. 



The second step is to send each LIM 22 a 
request to execute its local reduction queries. 
Since there is no interaction between LIMs during 
local reduction, this step is done asynchronously 

5 allowing the LIMs to execute in parallel. 

The third step is to send each LIM a request to 
replicate relevant fragments. The information as to 
what fragments need to be replicated is contained 
in the data file 28A sent to the LIM 22 during step 

io 1 above. This step is done sequentially so that only 
one LIM is doing fragment replication at any time. 
This will prevent deadlock situations by ensuring 
that all other LIMs will be available as servers to 
receive the fragment data. Depending on the tim- 

75 ing. it is possible for a LIM to still be executing its 
local reduction plan when another LIM attempts to 
send it fragment data. In that case, the LIM sending 
the fragment data is guaranteed to have to wait 
only a limited amount of time before its request is 

20 serviced. 

The fourth step is to request that each LIM 22 
execute its local query. This step is executed asyn- 
chronously allowing parallel execution of multiple 
LIMs. 

25 The fifth step is to send each LIM 22, except 

the home site, a request to send its intermediate 
result from step four above, to the home site. This 
step is executed synchronously to guarantee that 
the home site has received ail the intermediate 

30 results. 

The sixth step is to send a request to the home 
site to combine the intermediate results received 
from other sites with its own intermediate results, 
and to send the final result back to the DIM 14. 

35 This step is also done synchronously to prevent 
waiting global query requests at the DIM from 
executing before the results are received, thereby 
causing a deadlock situation. 

As a seventh and final step, the DPC 24 ar- 

40 chives the results and associates them with their 
schema and index for later retrieval by the applica- 
tion. 

The Local Information Manager (LIM) 22 ex- 
ecutes and monitors the local execution plan and 

45 interfaces with foreign DBMS for data retrieval. The 
specific LIM architecture will now be discussed. 

For each DataBase Management System 
(DBMS), a LIM 22 is required to provide the map- 
ping from the global view to the local view, the 

so translation from DISQL to the target DBMS lan- 
guage, and the interface to the target DBMS. The 
sub-components of the LIM are: the Local Control- 
ler; the Local Reduction Processor; the Subquery 
Processor; the Result Integrator; the Relation Re- 

55 plicator; and, the Relational DBMS Interface. The 
LIM*s preferred modular architecture facilitates the 
building of the new LIMs. For example, common 
features of relational databases such as "join", 
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"projection" and "selection" are modularized Into 
appropriate modules which can be shared between 
the LIM for each otherwise incompatible database. 

The Local Controller (LC) 22A controls the ex- 
ecution of the local plan sent by the DIM 14 by 
coordinating operations of the other LIMs' compo- 
nents. As mentioned above, this can be done syn- 
chronously or asynchronously. In both modes, ex- 
actly the same components are used except that 
the timing of execution in synchronous mode is 
controlled by the DIM 14. A local execution plan is 
broken into five discrete steps; local reduction; 
fragment replication; locaJ execution; result trans- 
mission; and, result integration. The last two steps 
are mutually exclusive since the home site per- 
forms result integration, but it never needs to trans- 
mit the results. In synchronous mode, each step is 
executed directly from the DIM through the ISTS; 
while, in asynchronous mode, all the steps are 
executed by the LC 22A through ISTS, but no 
coordination is needed by DIM. 

The Local Reduction Processor (LRP) 22B is 
responsible for executing the SQL statements in 
the local reduction file received from the DIM 14. 
Before starting execution, the LIM's internal data 
structures must be initialized with the information 
obtained from the' data file received from DIM 14. 
At this point, the LIM is initialized with the required 
information for the entire local plan, rather than just 
with the local reduction step. This information in- 
cludes such items as: result fragments; local frag- 
ments; and, replicated fragments which will be 
used during execution and the schema for those 
fragments. Information in the data file also deter- 
mines the sites destination for sending the inter- 
mediate results and fragments during replicated 
steps 3 and 5. The LIM also accesses the SDD 16 
to retrieve meta-data about the schema of all local 
relations used in the local plan. Once the above 
initialization is completed, the LRP 22B executes 
each of the SQL statements contained in the local 
reduction file. These SQL statements are typically 
either "create" or "select" statements. "Create" 
statements are used to create a temporary relation 
to hold fragments received from other LIMs. "Se- 
lect" statements are used to reduce data of local 
relations before replicating them to other sites. 

The Fragment Replicator (FR) 22C uses one 
ISTS, a Bulk-load Copy Protocol (BCP) 30, for 
efficiently replicating fragmented relations between 
LIMs. As shown in FIG. 5, the BCP 30 can be 
further broken into an application layer which is 
RDBMS independent, and a lower database layer 
which is actually part of the Relational DataBase 
Interface. 

The BCP is preferably implemented with two 
goals in mind: portability across RDBMSs and per- 
formance efficiency. Portability is achieved since 



the application layer is reusable across all LIMs. 
Unfortunately, the database layer needs to be cus- 
tomized for each DBMS. Efficiency is improved by 
requiring only one access to each relation being 
5 replicated regardless of the number of recipients. 
This is achieved by transmitting the relation data to 
each recipient as it is retrieved from the database. 
Efficiency is further improved by allowing the 
database layer to be customized, thereby taking 

10 advantage of any special bulkload transfer facilities 
supported by the underlying RDBMS. For example, 
in Sybase, the most efficient method of getting 
data both in and out of a relation is to use the 
"bulkload" utilities. While in Oracle, the most effi- 

75 cient way to get data out of a relation is to use an 
SQL query and array bindings. The most efficient 
way to insert the data into Oracle is to use the 
SQLLoader utility. 

The application layer of the BCP 30 interacts 

20 with the database layer through an ASCII buffer 32 
which allows it to be independent of the method 
used to retrieve. the data. A set of buffering utilities 
convert data from the representation used in the 
output (e.g., row of data in memory in Oracle) to 

25 ASCII, and from ASCII to the representation used 
for input (e.g., sqlloader data file in oracle). The 
actual data sent across the network contains rela- 
tion schema and other information such as: number 
of rows; size of data, etc. 

30 After the above local reduction and replicated 

fragments steps, all relevant fragments are copied 
and stored in temporary relations at designated 
sites. The Local Query Processor (LQP) 22D is 
responsible for executing queries in the local query 

35 file received from the DIM 14. Before executing 
these queries, the LQP 22D must combine all frag- 
ments of the same relation (those retrieved locally 
and those replicated from other LIMs). The LQP 
22D generates the appropriate SQL statements 

40 needed to combine the fragments and executes 
them using the RDBI. After the fragments are com- 
bined, the local query is executed using the RDBI, 
and the intermediate results are stored in a file. 
The Result Transmitter (RT) 22E component 

45 simply uses the remote file transfer (RFT) protocol 
provided by the ISTS 34 to transfer the file contain- 
ing the above intermediate results to the home site. 

The Result Integrator (Rl) 22F combines all the 
intermediate results received from other LIMs (via 

50 their result transmitters) with its own intermediate 
results output by its Local Query Processor 22D. It 
will then return the combined results to the DIM 14. 
In asynchronous mode, the combined results are 
returned to the DIM 14 as a separate step in which 

55 the DIM 14 becomes a server, and the LIM 22 
becomes a client. As previously discussed, this can 
potentially cause a deadlock situation to occur, 
since the DIM 14 may start the execution of an- 
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other query before it has received the results of the 
previous query. This is acceptable if the DIM 14 
can be interrupted, otherwise the LIM 22 will be 
waiting to transmit its combined results while the 
DIM 14 is waiting to execute the next query at the 
same LIM 22, thus causing a deadlock. Synchro- 
nous mode operation avoids this problem by re- 
turning the combined results to the DIM 14 as a 
response to the combine result request. In this 
way, the DIM 14 cannot start the execution of 
another query until result integration is complete at 
the home site, and the combined results are re- 
turned to the DIM 14. 

At this juncture, it should be understood that 
the present invention also includes architecture 
where at least one LIM under a DIM is also a DIM, 
so that the same architecture is recursively repli- 
cated with the second DIM acting as a LIM to its 
overlying DIM and as a DIM to its underlying LIMs. 
In other words, the same architecture being de- 
scribed herein between a DIM and a series of LIM 
may be replicated in a recursive manner by replac- 
ing at least one of the underlying LIMs with a DIM 
and continuing the replication as needed to group 
LIMs into a logical or physical responding unit. 

The Smart Data Dictionary (SDD) Server 16 
contains information such as: schema; data dis- 
tribution; sites configuration; domain knowledge; 
and, inter-site relationships. The SDD 16 itself is a 
database containing meta-data that can be used to 
support the DIM 14 and the LIM 22 in processing 
the queries. SDD data storage may be accom- 
plished by a UNIX file system or an Object-Ori- 
ented DataBase Management System (OODBMS) 
such as ITASCA. 

The SDD server 16 supports request to access 
the SDD's meta-data stored in a UNIX file system 
from a remote site. DIM 14 and LIM 22 may 
access the SDD server 16 remotely to retrieve 
meta-data for parsing, translating, optimizing, and 
coordinating the global and local queries. In FIG. 1, 
the SDD server 16 acts as a replacement for the 
knowledge-based manager for SDD that stores in a 
UNIX file system. 

The SDD contains meta-data such as: data 
distribution information, schema description, and 
FIM system configuration. The DIM uses the sche- 
ma and data distribution to generate the execution 
plans. The LIM uses the schema to perform local 
queries and map local queries to other sites. Cach- 
ing the meta-data at the processing site will greatly 
reduce the communication and accessing cost. For 
example, using Cache Memory Management 
(CMM), DIM 14 can access and cache the relevant 
data which might be used by the next query. This 
will eliminate unnecessary communication with the 
SDD 16 for retrieving the schema. Each LIM 22, 
DIM 14, and SDD server 16 uses the same CMM. 



FIG. 6 illustrates the CMM and the SDD server 
architecture. 

Specifically, all access to the SDD schema is 
done through the CMM which holds in memory the 
5 schema for the most recently used relations. If the 
data is not in the cache memory, CMM transpar- 
ently retrieves the relation locally (i.e., from a file or 
DBMS) or from another remote server, or both. 
Requests to the cache can either be for specific 
10 relations (such as relation name, field ID, number 
of fragments, etc.) or for entire schema. For effi- 
ciency reasons, the DIM and LIMs usually load the 
schema for all relations used in a query during the 
initialization hierarchy representing a sub-tree from 
is the global view starting at a specific level. The 
cache may use a linear search to access informa- 
tion, and be built using a modular architecture 
which allows easy replacement of the algorithms to 
search the cache and selectively swap out can- 
20 didates with more elaborate ones. The time of last 
use, the time brought into cache, and the number 
of times accessed are maintained for each cache 
entry to support such algorithms. The amount of 
memory used by the cache is a function of the 
25 number of relations it is holding. However, the 
maximum number of relations which it can hold 
must be specified when initializing the cache. 

The SDD server has two Distributed Transac- 
tion Services (ISTS): "send catalog" and "send 
30 schema" to support the operation of the SDD 
cache memory management. The "send catalog" 
service sends the client a copy of the relation 
catalog. The catalog contains a list of relations, 
each associated with an access type, and a server 
35 name. If a schema is not in cache memory, the 
access type determines how to retrieve the data 
such as from a local file, from a local RDBMS, or 
from another server. While the catalog may be 
retrieved prior to the execution of the first access 
40 to the cache, it is preferred that it be broadcast to 
all processes upon start up. The catalog will prefer- 
ably be maintained so that it is consistent on all 
sites at the same time. The "send catalog" service 
is a special memory to memory transfer between 
45 the server and the client. 

The "send schema" service accepts requests 
for multiple relations at different levels in the hierar- 
chy. The SDD server contains the same cache as 
other client/servers. Meta-data will be first searched 
so using the cache memory. If the meta-data is not 
found in cache, CMM connects to the server asso- 
ciated with the meta-data in the catalog, and re- 
trieves the requested meta-data. The server can 
reside locally or remotely. CMM modular architec- 
55 ture provides the capability to adapt to different 
storage management such as: OODBMS, relation, 
file and knowledge base management. 
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FIG. 7 shows the interactions between various 
components of CMM 36 and the server. The bulk 
of SDD server requests will be for schema access 
for which most of the work will be done by the 
local access methods of the cache in a transparent 
way to the service handler. 

The Inter-Site Transaction Services (ISTS) 34 
provides the inter-connection among sites. A set of 
distributed transaction services are provided to 
support the distributed access, local access plans, 
execution, and SDD meta-data access. The Distrib- 
uted Processing Coordinator (DPC), local controller 
(LC), and CMM (Cache Memory Management) 
components use ISTS to support inter-process 
communications. 

Specifically, ISTS 34 supports the inter-con- 
nection among different components: DIM 14; LIMs 
22; and the SDD server 16 which may reside at 
different sites and communicating through different 
network protocols such as TCP/IP, XN25 etc. A set 
of Inter-Site Transaction Services 34 using the 
Sybase Client/Open Server is preferably developed 
to support this communication. 

The Sybase Client/Open Server software pro- 
vides the transparency of low level network pro- 
tocols and manages resources of each connection. 
The Sybase Client/Open Server consists of Open 
Client, a programmable library interface for use by 
client applications, and the Sybase Open Server for 
use by servers of any type. These two interfaces 
offer a functionally rich set of library routines. It 
provides the necessary toolkit to implement trans- 
parent client/server communications between dif- 
ferent database products such as Oracle and In- 
gres, and between non-database servers, e.g., 
UNIX files system, on heterogeneous computers 
and networks. The interfaces can run on separate 
processors and platforms, communicating through 
a variety of Local Area Networks (LAN) transport 
protocols (TCP/IP, XN25, etc.). or run together on a 
single processor as shown in FIG. 8. Network sup- 
port is built into the products. The open Server's 
multi-thread architecture delivers high performance 
as a shared database server. 

It is preferred that the distributed processing 
architecture of the ISTS contain at least two major 
communication topologies: Hierarchical Distributed 
Processing Topology (HDPT) and Hierarchical Start 
Distributed Processing Topology (HSDPT) for con- 
necting the Query Browser and Editor (QuBE); 
Knowledge Base Manager (KBM) or SDD server; 
DIM; and LIMs components. 

The Hierarchical Distributed Processing Topol- 
ogy (HDPT) is an abstract model shown in FIG. 9 
used to support the distributed query processing of 
the present invention. This model improves the 
reliability and reduces the extent of communication 
and processing bottlenecks at the cost of inter-site 



communication and the complexity of the LIM. If 
the communication cost is negligible, then the 
HDPT described in FIG. 10 is more suitable. 

In the Hierarchical Start Distributed Processing 

5 Topology (HSDPT) illustrated in FIG. 9, there is no 
communication among the LIMs 22, and all the 
coordination is done at the DIM 14. This topology 
simplifies the LIM development, and there is no 
cost of inter-communication among the LIMs. The 

70 DIM becomes a bottleneck and is very vulnerable 
to failure. 

In general, the decision of choosing one topol- 
ogy over the other topology depends on the spe- 
cific application environment. 
75 The present invention takes advantage of the 

Sybase Open Server features to support the client- 
to-server, server-to-server, and server-to-client 
communications in a heterogeneous environment 
among DIM, LIMs, KBM and QuBE. Each ISTS 38, 
20 as illustrated in FIG. 1 1 , consists of a pair of Client 
Request and Server Service. The ISTS fits into the 
Application Layer of the standard OSI protocols, 
and consists of three sub-layers: 

(a) Application Interface Layer: provides the high 
25 level interface to any application that requires 
the inter-process communication. Routines at 
this level are usually independent of the specific 
hardware and software and are therefore re- 
usable across applications; 
30 (b) Distributed Transaction layer: supports the 
necessary services for distributed databases. 
Routines at this level are usually hardware 
and/or software specific, and may therefore 
need to be modified when interfacing with dif- 
35 ferent DBMSs or operating systems; 

(c) Network Communications Layer: is the layer 
that provides the logical/physical connection and 
inter-connection among sites. The Sybase Cli- 
ent/Open Server may be used at this level. 
40 The ISTS can be organized in a hierarchy as 

shown in FIG. 11 which encodes both a class 
hierarchy and a composition hierarchy. Dotted lines 
represent a "type-of" relation between services, 
while solid black lines represent a "part-of" relation 
45 between services. For example, a language inter- 
face which has three main components: a parser, a 
translator, and common utilities which are "part-of" 
but not a sub-class of a language interface. On the 
other hand, the Oracle and Sybase translators are 
50 a "type-of", or sub-class of language translators. 

It is preferred that the major ISTS groups in- 
clude: 

(1) A Network Interface that provides connec- 
tivity plus high level protocols such as remote 
55 file transfer, remote bulk copy, meta-data pro- 
cessing and distributed query processing. The 
Connectivity services are at the DTS level and 
support connections and communications to oth- 
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er server/clients. In the preferred embodiment, 
the connectivity services are built on top of the 
Sybase open client/server libraries and provide 
the network building blocks for most other ser- 
vices. Protocols are at the application level and 5 
use connectivity services as well as other ser- 
vices from the hierarchy, for example, the BCP 
protocol in the network interface uses the BCP 
utilities part of the relational database interface. 

(2) A Language Interface that provides for the w 
mapping from one language to another such as 
from DISQL to Sybase SQL or Oracle SQL. 
Some common operations can be shared be- 
tween language interfaces, but most likely it is 
preferred that each will require a separate par- 15 
ser and translator. 

(3) A Database Interface that provides access to 
the underlying database supporting such oper- 
ations as connection, query execution (open, 
parse, execute), and result retrieval (bind, next 20 
row). The interface also provides bulk access 
methods (in, out, buffer) if they exist. 

(4) A Distributed Query Processing Interface that 
provides the services needed to support the 

DIM (optimizer) and the operation of the Distrib- 25 
uted Processing controller and the Local Con- 
troller (distributed query execute, local query 
execute, local reduction, etc.). 
The ISTS layer architecture supports an open 
and modular implementation for interoperability of 30 
heterogeneous information management systems. 
It takes advantages of third party software. 
Changes to one layer will not affect other layers. 
For example, if one communication package were 
replaced for another, other layers implementation 35 
remain intact. 

The invention described above is, of course, 
susceptible to many variations, modifications and 
changes, all of which are within the skill of the art. 
It should be understood that all such variations. 40 
modifications and changes are within the spirit and 
scope of the invention and of the appended claims. 
Similarly, it will be understood that Applicant in- 
tends to cover and claim all changes, modifications 
and variations of the example of the preferred 45 
embodiment of the invention herein disclosed for 
the purpose of illustration which do not constitute 
departures from the spirit and scope of the present 
invention. 

50 

Claims 

1. A computer data network having a communica- 
tions medium commonly connecting a plurality 
of databases containing data with a plurality of 55 
users (12) each capable of generating a global 
data request for accessing and retrieving data 
from said databases in accord with a single 



query protocol, the network comprising a glo- 
bally integrated data retrieval controller archi- 
tecture for controlling and directing the trans- 
mission of the user generated global data re- 
quest to individual ones of the plurality of 
databases and for receiving and integrating the 
requested data received from the databases 
into a single response and for transmitting the 
integrated single response to the requesting 
user (12), characterized by: 

- a smart data dictionary (SDD) means 
(16) containing a database of data repre- 
senting schema, data distribution, local 
site configuration and inter-site relations- 
ships of data among the databases in the 
network, for each database in the net- 
work; 

- a data information manager (DIM) means 
(14) communicating both with said smart 
data dictionary (SDD) means (16) to re- 
trieve data therefrom, and with said user 
(12) to receive said global data query 
therefrom and to transmit responsive 
data thereto, for decomposing the global 
data query into a local-site execution 
plan (LEP) for retrieval of data from each 
database having data responsive to the 
global data query in accord with the data 
contained in said smart data dictionary 
(SDD) means (16), and for transmitting 
that portion of said local-site execution 
plan (LEP) to be executed to the appro- 
priate database for execution, and receiv- 
ing data therefrom responsive to said 
local-site execution plan (LEP); 

- a plurality of local information manager 
(LIM) means (22), each communicating 
with said data information manager (DIM) 
means (14) and said smart data dictio- 
nary (SDD) means (16), for controlling 
data flow to and from a specified 
database in the network in response to 
that portion of said local-site execution 
plan (LEP) received from said data in- 
formation manager (DIM) means (14) and 
for transmitting retrieved data responsive 
to that portion of said local-site execution 
plan to said data information manager 
(DIM) means (14); and 

- each said local information manager 
(LIM) means (22) further adapted for 
generating, in accord with the data con- 
tained in said smart data dictionary 
(SDD) means (16), a data retrieval re- 
quest for execution by another local in- 
formation manager (LIM) means (22) and 
for receiving data therefrom in response 
thereto, in order to complete that portion 
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of said local-site execution plan (LEP) 
received by it for execution. 

2. A computer data network controlling and di- 
recting the transmission of a user generated 5 
global data request to individual ones of a 
plurality of nodes and associated databases 
and for receiving and integrating the requested 
data received from the databases through the 
nodes into a single response and for transmit- w 
ting the integrated single response to the re- 
questing user (12), characterized by: 

- a plurality of nodes, each having asso- 
ciated therewith a database containing 
data; 75 

- a communication medium connecting the 
nodes with a plurality of users (12), each 
capable of generating a global data re- 
quest for accessing and retrieving data 
from the databases through their asso- 20 
ciated nodes in accord with a single que- 
ry protocol; 

- a smart data dictionary (SDD) node (16) 
connected to the computer data network 
and- controlling input/output access to a 25 
database of data representing schema, 
data distribution, locaJ site configuration 

and inter-site relationships of data among 
the nodes and their associated databases 
in the network, for each node and asso- 30 
ciated database in the network; 

- a data information manager (DIM) con- 
troller (14) communicating both with said 
smart data dictionary (SDD) node (16) to 
retrieve data therefrom, and with said 35 
user (12) to receive said global data que- 
ry therefrom and to transmit responsive 
data thereto, for decomposing the global 
data query into a local-site execution 

plan (LEP) for retrieval of data from each 40 
database through its associated node, 
said database having data responsive to 
the global data query in accord with the 
data contained in said database asso- 
ciated with said smart data dictionary 45 
(SDD) node (16), and for transmitting that 
portion of said local-site execution plan 
(LEP) to be executed to the appropriate 
node and associated database for execu- 
tion, and receiving data therefrom re- 50 
sponsive to said local-site execution plan; 

- a plurality of local information manager 
(LIM) controllers (22), each communicat- 
ing with said data information manager 
(DIM) controller (14) and said smart data 55 
dictionary (SDD) node (16) for controlling 
data flow to and from a specified 
database in the network in response to 



that portion of said local-site execution 
plan (LEP) received from said data in- 
formation manager (DIM) controller (14) 
and for transmitting retrieved data re- 
sponsive to that portion of said local-site 
execution plan (LEP). to said data in- 
formation manager (DIM) controller (14), 
- each said local information manager 
(LIM) controller (22) further adapted for 
generating, in accord with the data con- 
tained in said database associated with 
said smart data dictionary (SDD) node 
(16), a data retrieval request for execu- 
tion by another locaJ information man- 
ager (LIM) controller (22) and for receiv- 
ing data therefrom in response thereto, in 
order to complete that portion of said 
local-site execution plan (LEP) received 
by it for execution. 

3. The network of claim 1 or 2, characterized by 
at least one local information manager (LIM) 
controller (22) controlling data flow to and from 
at least two local databases and adapted to 
decompose that portion of said local site ex- 
ecution plan (LEP) received from said data 
information manager (DIM) (14) into a sub-local 
site execution plan for retrieval of data respon- 
sive to that portion of said local site execution 
plan (LEP) received from said data information 
manager (DIM) (14) from each of said con- 
trolled local databases. 

4. The network of any of claims 1 - 3, character- 
ized in that said data information manager 
(DIM) (14) comprises a syntactic and semantic 
parser (SSP) device (18) interfacing with said 
smart data dictionary (SDD) (16) for retrieving 
data representing local schema information 
and the inter-relationships among data, and for 
parsing and validating the syntax of the global 
data request using such data retrieved from 
said smart data dictionary (SDD) (16). 

5. The network of any of claims 1 - 4, character- 
ized in that said data information manager 
(DIM) (14) comprises a data optimizer unit 
(14a) interfacing with said smart data dictionary 
(SDD) (16) for retrieving data representing lo- 
cal schema information and the inter-relations- 
ships among data, and for minimizing the 
amount of data needed to be transferred 
among local site databases and for choosing 
the appropriate local site database for process- 
ing each portion of the local site execution plan 
(LEP). 
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6. The network of any of claims 1-5, character- 
ized in that said data information manager 
(DIM) (14) comprises local site execution plan 
(LEP) control controller interfacing with each of 
said local site databases to send each local 
site database that portion of said local site 
execution plan (LEP) necessary to extract re- 
sponsive data from each said local site 
database. 

7. The network of any of claims 1-6, character- 
ized in that said local information manager 
(LIM) (22) further includes a local controller 
(LC) (22a) for controlling the execution of that 
portion of the local site execution plan (LEP) 
sent by the data information manager (DIM) 
(14) by coordinating all internal operations. 

8. The network of claim 1 , characterized by: 

- said data information manager (DIM) 
means (14) including syntactic and se- 
mantic parser (SSP) means (1 8) interfac- 
ing with said smart data dictionary (SDD) 
means (16) for retrieving data represent- 
ing local schema information and the in- 
ter-relationships among data, and for 
parsing and validating the syntax of the 
global data request using such data re- 
trieved from said smart data dictionary 
(SDD) means (16); 

- data information manager (DIM) means 
(14) further including optimizer means 
(14a) interfacing with said smart data dic- 
tionary (SDD) means (16) for retrieving 
data representing local schema informa- 
tion and the inter-relationsships among 
data, and for minimizing the amount of 
data needed to be transferred among 
local site databases and for choosing the 
appropriate local site database for pro- 
cessing each portion of the local site 
execution plan (LEP); 

- said data information manager (DIM) 
means (14) also including local site ex- 
ecution plan (LEP) control means inter- 
facing with each of said local site 
databases to send each local site 
database that portion of said local site 
execution plan (LEP) necessary to ex- 
tract responsive data from each said lo- 
cal site database; 

- at least one local information manager 
(LIM) means (22) controlling data flow to 
and from at least two local databases 
and adapted to decompose that portion 
of said local site execution plan (LEP) 
received from said data information man- 
ager (DIM) means (14) into a sub-local 



site execution plan for retrieval of data 
responsive to that portion of said local 
site execution plan received from said 
data information manager (DIM) means 
5 (14) from each of said controlled local 

databases; and 

- said local information manager (LIM) 
means (22) further includes local control- 
ler (LC) means (22a) for controlling the 

10 execution of that portion of the local site 

execution plan sent by the data informa- 
tion manager (DIM) means (14) by co- 
ordinating all internal operations. 

75 9. The network of any of claims 1-8, character- 
ized in that said local controller (LC) (22a) 
operates synchronously. 

10. The network of any of claims 1-8, character- 
20 ized in that said local controller (LC) (22a) 

operates asynchronously. 

11. A method for controlling and directing the 
transmission of user generated global data 

25 query to individual ones of a plurality of 

databases and for receiving and integrating the 
requested data received from the databases 
into a single response and for transmitting the 
integrated single response to the requesting 

30 user (12) in a computer data network having a 

communications medium commonly connect- 
ing the plurality of databases containing data 
with the plurality of users (12) each capable of 
generating a global data query for accessing 

35 and retrieving data from said databases in ac- 

cord with a single global query protocol, char- 
acterized by the steps of: 

- creating a smart data dictionary (SDD) 
local site database profile containing data 

40 representing schema, data distribution, 

local site configuration and inter-site re- 
lationsships of data among the local site 
databases in the network, for each local 
site database in the network; 

45 - communicating with said smart data dic- 

tionary (SDD) local site database profile 
to retrieve data therefrom for decompos- 
ing the global data query into a low-site 
execution plan (LEP) for retrieval data 

so from each local site database having 

data responsive to the global data query 
in accord with the data contained in said 
smart data dictionary (SDD) local site 
database profile; 

55 - decomposing the global data query into 

a local-site execution plan (LEP) for re- 
trieval of data from each local site 
database having data responsive to the 

12 
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global data query in accord with the data 
contained in said smart data dictionary 
(SDD) local site database profile; 

- transmitting that portion of said local-site 
execution plan (LEP) to be executed to 5 
the appropriate local site database for 
execution; 

- receiving data from each local site 
database responsive to said local-site ex- 
ecution plan (LEP) and creating a global 10 
response database containing such re- 
sponsive data received from each local 

site database; and 

- providing the user (12) access to the 
global response database in accord with 75 
the single global query protocol. 

12. The method of claim 11, characterized by the 
steps of: 

- creating for each of said local site 20 
databases a plurality of local information 
manager (LIM) means (22), each commu- 
nicating with said smart data dictionary 
(SDD) local site database profile for con- 
trolling data flow to and from a specified 25 
local site database in the network in re- 
sponse to that portion of said local-site 
execution plan (LEP) received by the lo- 
cal site database; and 

- generating, in accord with the data con- 30 
tained in said smart data dictionary 
(SDD) means, a data retrieval request for 
execution by any other local site 
databases necessary to complete that 
portion of said local-site execution plan 35 
received by it for execution. 

13. The method of claim 11 or 12, characterized in 
that the step of decomposing the global query 
further comprises: 40 

- parsing and validating the syntax of the 
global query; 

- identifying the appropriate local site 
database containing the requested data 

by interfacing with the smart data die- 45 
tionary (SDD) database to retrieve in- 
formation about the local query protocol 
and data semantic schema established 
for each individual local site database 
and generating a plurality of local site 50 
database queries for retrieving respon- 
sive data from each of the local site 
databases containing such responsive 
data; 

- optimizing the plurality of local site que- 55 
ries to generate a local site execution 

plan (LEP) that optimizes that total time 
for executing the global query by mini- 



mizing the amount of data needed to be 
transferred among local sites and by 
choosing the appropriate home local site 
for processing the local site execution 
plan (LEP); 

- coordinating the execution of that portion 
of the local site execution plan (LEP) that 
applies to each of the local site 
databases by: 

(a) sending each local information man- 
ager (LIM) means (22) that portion of 
said local site execution plan (LEP) ap- 
plicable to data requests from its asso- 
ciated local site database; 

(b) sending each local information man- 
ager (LIM) means (22) a request to ex- 
ecute a local reduction plan; 

(c) sending each local information man- 
ager (LIM) means (22) a request to repli- 
cate relevant responsive data fragments; 

(d) sending each local information man- 
ager (LIM) means (22) a request to ex- 
ecute unsynchronously a local data que- 
ry on its associated local site database; 

(e) sending each local information man- 
ager (LIM) (22) except the local informa- 
tion manager (LIM) (22) for the home site 
database, a request to synchronously 
send data responsive to the local data 
query request to the local information 
manager (LIM) (22) for the home site; 
and 

(f) archiving the received data for trans- 
mission to the requesting user (12). 
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